Goto

Collaborating Authors

 back propagation



Differential Informed Auto-Encoder

Zhang, Jinrui

arXiv.org Artificial Intelligence

If the physics formula was obtained in the form of differential equations, a physics-informed neural network can be built to solve it numerically on a global scale [5, PINN].This process could be seen as a decoder in a way that takes a sample point in the domain of the partial differential equations, and solve it to get the corresponding output of each input point. If only a small and random amount of training data was obtained, to re-sample from the domain, we need to obtain the differential relationship of the data. This process could be viewed as an encoder that encodes the inner structure of the original data.


The Unified Balance Theory of Second-Moment Exponential Scaling Optimizers in Visual Tasks

Zhang, Gongyue, Liu, Honghai

arXiv.org Artificial Intelligence

Existing first-order optimizers mainly include two branches: classical optimizers represented by Stochastic Gradient Descent (SGD) and adaptive optimizers represented by Adam, along with their many derivatives. The debate over the merits and demerits of these two types of optimizers has persisted for a decade. In practical experience, it is generally considered that SGD is more suitable for tasks like Computer Vision(CV), while adaptive optimizers are widely used in tasks with sparse gradients, such as Large Language Models(LLM). Although adaptive optimizers always offer better convergence speeds in almost all tasks, they can lead to over-fitting in some cases, resulting in poorer generalization performance compared to SGD in certain tasks. Even in Large Language Models, Adam continues to face challenges, and its original strategy may not always have an advantage due to the introduction of improvements such as gradient clipping. With a wide variety of optimization methods available, it is essential to introduce a unified, interpretable theory. This paper will discuss under the framework of first-order optimizers and, through the intervention of the balance theory, will for the first time propose a unified strategy to integrate all first-order optimization methods.


End-to-end Learning of LDA by Mirror-Descent Back Propagation over a Deep Architecture

Neural Information Processing Systems

We develop a fully discriminative learning approach for supervised Latent Dirichlet Allocation (LDA) model using Back Propagation (i.e., BP-sLDA), which maximizes the posterior probability of the prediction variable given the input document. Different from traditional variational learning or Gibbs sampling approaches, the proposed learning method applies (i) the mirror descent algorithm for maximum a posterior inference and (ii) back propagation over a deep architecture together with stochastic gradient/mirror descent for model parameter estimation, leading to scalable and end-to-end discriminative learning of the model. As a byproduct, we also apply this technique to develop a new learning method for the traditional unsupervised LDA model (i.e., BP-LDA). Experimental results on three real-world regression and classification tasks show that the proposed methods significantly outperform the previous supervised topic models, neural networks, and is on par with deep neural networks.


Convergence Acceleration of Markov Chain Monte Carlo-based Gradient Descent by Deep Unfolding

Hagiwara, Ryo, Takabe, Satoshi

arXiv.org Machine Learning

The proposed solver is based on the Ohzeki method that combines Markov-chain Monte-Carlo (MCMC) and gradient descent, and its step sizes are trained by minimizing a loss function. In the training process, we propose a sampling-based gradient estimation that substitutes auto-differentiation with a variance estimation, thereby circumventing the failure of back propagation due to the non-differentiability of MCMC. The numerical results for a few COPs demonstrated that the proposed solver significantly accelerated the convergence speed compared with the original Ohzeki method. Combinatorial optimization problems (COPs) comprising discrete variables are considered hard to solve exactly in polynomial time, which relates to the well-known P vs. NP problem. Along with deterministic approximation algorithms, samplers such as Markovchain Monte-Carlo (MCMC) have been applied to COPs. However, the convergence time for obtaining reasonable approximate solutions is long.


A Novel Method for improving accuracy in neural network by reinstating traditional back propagation technique

R, Gokulprasath

arXiv.org Artificial Intelligence

Deep learning has revolutionized the field of artificial intelligence by enabling machines to learn complex patterns and perform tasks that were previously deemed impossible. However, training deep neural networks is a challenging and computationally expensive task that requires optimizing millions or even billions of parameters. The back propagation algorithm has been the go-to method for training [5] deep neural networks for decades, but it suffers from some limitations, such as slow convergence and the vanishing gradient problem. To overcome these limitations, several alternative training methods have been proposed, such as Standard Back propagation and Direct Feedback Alignment. The core idea of this approach is to update the weights and biases in each layer of a neural network using the local error at that layer, rather than back propagating the error from the output layer to the input layer.[2] By doing so, the training process can be accelerated and the model's accuracy can be improved.


Generalization of Back propagation to Recurrent and Higher Order Neural Networks

Neural Information Processing Systems

The propagation of activation in these networks is determined by dissipative differential equations. The error signal is backpropagated by integrating an associated differential equation. The method is introduced by applying it to the recurrent generalization of the feedforward backpropagation network. The method is extended to the case of higher order networks and to a constrained dynamical system for training a content addressable memory. The essential feature of the adaptive algorithms is that adaptive equation has a simple outer product form.


Induction of Multiscale Temporal Structure

Neural Information Processing Systems

Learning structure in temporally-extended sequences is a difficult com(cid:173) putational problem because only a fraction of the relevant information is available at any instant. Although variants of back propagation can in principle be used to find structure in sequences, in practice they are not sufficiently powerful to discover arbitrary contingencies, especially those spanning long temporal intervals or involving high order statistics. For example, in designing a connectionist network for music composition, we have encountered the problem that the net is able to learn musical struc(cid:173) ture that occurs locally in time-e.g., relations among notes within a mu(cid:173) sical phrase-but not structure that occurs over longer time periods--e.g., relations among phrases. To address this problem, we require a means of constructing a reduced deacription of the sequence that makes global aspects more explicit or more readily detectable. I propose to achieve this using hidden units that operate with different time constants.


The deep learning project which led me to burnout

#artificialintelligence

In this article, I will present you the deep learning project that I wanted to perform, then I'll present the techniques and approach that I used to tacle this. And I will end up that article with some meaningful reflections, that I hope would help some of you. I wanted to build a smartphone app which can recognize flower from taken picture. Basically the app is splitted into two parts, the front-end part which is basically the mobile development. I wanted to build from scratch a deep learning model without deep learning framework, to help me understand the inner working process of image classification (I know it sounds crazy).


Back Propagation. Backpropagation is a popular algorithm…

#artificialintelligence

Backpropagation is a popular algorithm used for training neural networks. Here, X is the input data, y is the corresponding output data, hidden_layer_size is the number of neurons in the hidden layer, learning_rate is the learning rate, and num_iterations is the number of iterations to train the model for. The sigmoid() function computes the sigmoid activation function. Here, we define the sigmoid activation function, which takes in an input value x and returns the output of the sigmoid function. Next, we define the derivative of the sigmoid function, which takes in an input value x and returns the derivative of the sigmoid function with respect to x.